NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient centroid-linkage clustering

Bateni, Mohammad; Dhulipala, Laxman; Fletcher, Willem; Gowda, Kishen; Hershkowitz, D Ellis; Jayaram, Rajesh; Łącki, Jakub (June 2025, Proceedings of the 38th International Conference on Neural Information Processing Systems)

Full Text Available
Improved Spectral Density Estimation via Explicit and Implicit Deflation

Bhattacharjee, Rajarshi; Jayaram, Rajesh; Musco, Cameron; Musco, Christopher; Ray, Archan (January 2025, ACM-SIAM Symposium on Discrete Algorithms)

We study algorithms for approximating the spectral density (i.e., the eigenvalue distribution) of a symmetric matrix A ∈ ℝn×n that is accessed through matrix-vector product queries. Recent work has analyzed popular Krylov subspace methods for this problem, showing that they output an ∈ · || A||2 error approximation to the spectral density in the Wasserstein-1 metric using O (1/∈ ) matrix-vector products. By combining a previously studied Chebyshev polynomial moment matching method with a deflation step that approximately projects off the largest magnitude eigendirections of A before estimating the spectral density, we give an improved error bound of ∈ · σℓ (A) using O (ℓ log n + 1/∈ ) matrix-vector products, where σℓ (A) is the ℓth largest singular value of A. In the common case when A exhibits fast singular value decay and so σℓ (A) « ||A||2, our bound can be much stronger than prior work. We also show that it is nearly tight: any algorithm giving error ∈ · σℓ (A) must use Ω(ℓ + 1/∈ ) matrix-vector products. We further show that the popular Stochastic Lanczos Quadrature (SLQ) method essentially matches the above bound for any choice of parameter ℓ, even though SLQ itself is parameter-free and performs no explicit deflation. Our bound helps to explain the strong practical performance and observed ‘spectrum adaptive’ nature of SLQ, and motivates a simple variant of the method that achieves an even tighter error bound. Technically, our results require a careful analysis of how eigenvalues and eigenvectors are approximated by (block) Krylov subspace methods, which may be of independent interest. Our error bound for SLQ leverages an analysis of the method that views it as an implicit polynomial moment matching method, along with recent results on low-rank approximation with single-vector Krylov methods. We use these results to show that the method can perform ‘implicit deflation’ as part of moment matching.
more » « less
Full Text Available
Improved Spectral Density Estimation via Explicit and Implicit Deflation

Bhattacharjee, Rajarshi; Jayaram, Rajesh; Musco, Cameron; Musco, Christopher; Ray, Archan (January 2025, ACM-SIAM Symposium on Discrete Algorithms (SODA) 2025.)

Full Text Available
Improved Spectral Density Estimation via Explicit and Implicit Deflation

Bhattacharjee, Rajarshi; Jayaram, Rajesh; Musco, Cameron; Musco, Christopher; Ray, Archan (January 2025, SODA 2025)

Full Text Available
Data-Dependent LSH for the Earth Mover’s Distance

https://doi.org/10.1145/3618260.3649666

Jayaram, Rajesh; Waingarten, Erik; Zhang, Tian (June 2024, ACM)

Full Text Available
Streaming Algorithms with Few State Changes

https://doi.org/10.1145/3651145

Jayaram, Rajesh; Woodruff, David P; Zhou, Samson (May 2024, Proceedings of the ACM on Management of Data)

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental F_pmoment estimation problem with p ≥ 1, any streaming algorithm that achieves a constant factor approximation must make Ω(n^1-1/p) internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm which also has near-optimal space complexity. Specifically, we give a (1+ε)-approximation algorithm for F_pmoment estimation that use a near-optimal ~O_ε(n^1-1/p) number of state changes, while simultaneously achieving near-optimal space, i.e., for p∈[1,2), our algorithm uses poly(log n,1/ε) bits of space for, while for p>2, the algorithm uses ~O_ε(n^1-1/p) space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity.
more » « less
Full Text Available
Near-Linear Time Algorithm for the Chamfer Distance

Bakshi, Ainesh; Indyk, Piotr; Jayaram, Rajesh; Silwal, Sandeep; Waingarten, Erik (December 2023, Advances in neural information processing systems)

Full Text Available
It’s Hard to HAC Average Linkage!

https://doi.org/10.4230/LIPIcs.ICALP.2024.18

Bateni, MohammadHossein; Dhulipala, Laxman; Gowda, Kishen N; Hershkowitz, D Ellis; Jayaram, Rajesh; Łącki, Jakub (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)
Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient parallel algorithms for average linkage HAC. We provide hardness results that rule out such algorithms. On the sequential side, we establish a runtime lower bound of n^{3/2-ε} on n node graphs for sequential combinatorial algorithms under standard fine-grained complexity assumptions. This essentially matches the best-known running time for average linkage HAC. On the parallel side, we prove that average linkage HAC likely cannot be parallelized even on simple graphs by showing that it is CC-hard on trees of diameter 4. On the possibility side, we demonstrate that average linkage HAC can be efficiently parallelized (i.e., it is in NC) on paths and can be solved in near-linear time when the height of the output cluster hierarchy is small.
more » « less
Full Text Available
Streaming Euclidean MST to a Constant Factor

https://doi.org/10.1145/3564246.3585168

Chen, Xi; Cohen-Addad, Vincent; Jayaram, Rajesh; Levi, Amit; Waingarten, Erik (June 2023, Proceedings of the 55th ACM Symposium on Theory of Computing (STOC))
A Framework for Adversarially Robust Streaming Algorithms

https://doi.org/10.1145/3498334

Ben-Eliezer, Omri; Jayaram, Rajesh; Woodruff, David P.; Yogev, Eylon (April 2022, Journal of the ACM)

We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally F p -estimation, F p -heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a poly(log n , 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios.
more » « less
Full Text Available

« Prev Next »

Search for: All records